Nvidia’s New AI Server Delivers 10x Performance Boost for Mixture-of-Expert Models
Nvidia's latest benchmark data reveals its new AI server achieves a tenfold performance improvement for mixture-of-expert AI models. The system packs 72 of Nvidia's top-tier chips into a single interconnected unit, specifically optimized for inference tasks rather than model training.
The timing is strategic. As the AI industry shifts focus from training to deployment, Nvidia faces mounting competition from AMD and Cerebras. The mixture-of-expert approach, popularized by China's DeepSeek open-source model in early 2025, splits queries among specialized subsystems for efficiency gains.
While Nvidia currently leads in inference performance, AMD is developing a competing multi-chip server slated for 2026 release. This development comes as the market for AI infrastructure becomes increasingly bifurcated between training and inference solutions.